A Daptive Q Uantization of N Eural N Etworks

نویسنده

  • Jing Li
چکیده

Despite the state-of-the-art accuracy of Deep Neural Networks (DNN) in various classification problems, their deployment onto resource constrained edge computing devices remains challenging due to their large size and complexity. Several recent studies have reported remarkable results in reducing this complexity through quantization of DNN models. However, these studies usually do not consider the changes in the loss function when performing quantization, nor do they take the different importances of DNN model parameters to the accuracy into account. We address these issues in this paper by proposing a new method, called adaptive quantization, which simplifies a trained DNN model by finding a unique, optimal precision for each network parameter such that the increase in loss is minimized. The optimization problem at the core of this method iteratively uses the loss function gradient to determine an error margin for each parameter and assigns it a precision accordingly. Since this problem uses linear functions, it is computationally cheap and, as we will show, has a closed-form approximate solution. Experiments on MNIST, CIFAR, and SVHN datasets showed that the proposed method can achieve near or better than state-of-the-art reduction in model size with similar error rates. Furthermore, it can achieve compressions close to floating-point model compression methods without loss of accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Q Uantized B Ack - P Ropagation : T Raining B Ina - Rized N Eural N Etworks with Q Uantized G Ra - Dients

Binarized Neural networks (BNNs) have been shown to be effective in improving network efficiency during the inference phase, after the network has been trained. However, BNNs only binarize the model parameters and activations during propagations. We show there is no inherent difficulty in training BNNs using ”Quantized BackPropagation” (QBP), in which we also quantized the error gradients and i...

متن کامل

Omparing F Ixed and a Daptive C Omputation T Ime for R E - Current N Eural N Etworks

Deep networks commonly perform better than shallow ones (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016), but allocating the proper amount of computation for each particular input sample remains an open problem. This issue is particularly challenging in sequential tasks, where the required complexity may vary for different tokens in the input sequence. Adaptive Computation...

متن کامل

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS

Understanding procedural language requires anticipating the causal effects of actions, even when they are not explicitly stated. In this work, we introduce Neural Process Networks to understand procedural text through (neural) simulation of action dynamics. Our model complements existing memory architectures with dynamic entity tracking by explicitly modeling actions as state transformers. The ...

متن کامل

A DAPTIVE , B EST - E FFORT D ELIVERY OF L IVE A UDIO AND V IDEO A CROSS P ACKET - S WITCHED N ETWORKS Video

This videotape is a demonstration of a transport protocol developed by the authors for the transmission of live audio and video streams. The goal of this work has been to understand the complexity of supporting applications such as desktop video conferencing when the network does not support real-time communication. We believe this problem is important because such networks will likely exist fo...

متن کامل

Adaptive spread transform QIM watermarking algorithm based on improved perceptual models

The quantization step is one of the most important factors which affect the performance of quantization watermarking used for image copyright protection. According to the characteristic of perceptual model and the specific attacks, improved perceptual model and different implementations of perceptual model eywords: uantization watermarking erceptual model pread transform uantization index modul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018